Tensors, Learning, and 'Kolmogorov Extension' for Finite-alphabet Random Vectors

نویسندگان

  • Nikos Kargas
  • Nikos D. Sidiropoulos
  • Xiao Fu
چکیده

Estimating the joint probability mass function (PMF) of a set of random variables lies at the heart of statistical learning and signal processing. Without structural assumptions, such as modeling the variables as a Markov chain, tree, or other graphical model, joint PMF estimation is often considered mission impossible – the number of unknowns grows exponentially with the number of variables. But who gives us the structural model? Is there a generic, ‘non-parametric’ way to control joint PMF complexity without relying on a priori structural assumptions regarding the underlying probability model? Is it possible to discover the operational structure without biasing the analysis up front? What if we only observe random subsets of the variables, can we still reliably estimate the joint PMF of all? This paper shows, perhaps surprisingly, that if the joint PMF of any three variables can be estimated, then the joint PMF of all the variables can be provably recovered under relatively mild conditions. The result is reminiscent of Kolmogorov’s extension theorem – consistent specification of lower-order distributions induces a unique probability measure for the entire process. The difference is that for processes of limited complexity (rank of the highorder PMF) it is possible to obtain complete characterization from only third-order distributions. In fact not all third order PMFs are needed; and under more stringent conditions even second-order will do. Exploiting multilinear (tensor) algebra, this paper proves that such higher-order PMF completion can be guaranteed – several pertinent identifiability results are derived. It also provides a practical and efficient algorithm to carry out the recovery task. Judiciously designed simulations and real-data experiments on movie recommendation and data classification are presented to showcase the effectiveness of the approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An extension theorem for finite positive measures on surfaces of finite‎ ‎dimensional unit balls in Hilbert spaces

A consistency criteria is given for a certain class of finite positive measures on the surfaces of the finite dimensional unit balls in a real separable Hilbert space. It is proved, through a Kolmogorov type existence theorem, that the class induces a unique positive measure on the surface of the unit ball in the Hilbert space. As an application, this will naturally accomplish the work of Kante...

متن کامل

LIMIT LAWS FOR SYMMETRIC k-TENSORS OF REGULARLY VARYING MEASURES

In this paper we establish the asymptotics of certain symmetric k–tensors whose underlying distribution is regularly varying. Regular variation is an asymptotic property of probability measures with heavy tails. Regular variation describes the power law behavior of the tails. Tensors and tensor products are useful in probability and statistics, see for example [7, 14, 17]. Random tensors are co...

متن کامل

A Generic Algorithm for Learning Symbolic Automata from Membrship Queries

We present a generic algorithmic scheme for learning languages defined over large or infinite alphabets such as bounded subsets of N and R, or Boolean vectors of high dimension. These languages are accepted by deterministic symbolic automata that use predicates to label transitions, forming a finite partition of the alphabet for every state. Our learning algorithm, an adaptation of Angluin’s L∗...

متن کامل

The Complexity of Finite Objects and the Development of the Concepts of Information and Randomness by Means of the Theory of Algorithms

In 1964 Kolmogorov introduced the concept of the complexity of a finite object (for instance, the words in a certain alphabet). He defined complexity as the minimum number of binary signs containing all the information about a given object that are sufficient for its recovery (decoding). This definition depends essentially on the method of decoding. However, by means of the general theory of al...

متن کامل

A Generic Algorithm for Learning Symbolic Automata from Membership Queries

We present a generic algorithmic scheme for learning languages defined over large or infinite alphabets such as bounded subsets of N and R, or Boolean vectors of high dimension. These languages are accepted by deterministic symbolic automata that use predicates to label transitions, forming a finite partition of the alphabet for every state. Our learning algorithm, an adaptation of Angluin’s L∗...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1712.00205  شماره 

صفحات  -

تاریخ انتشار 2017